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Figure 1: Left: the compressed frame. Mid-Left: the recovered intrinsic layer by JAS [14]. Rest: the recovered intrinsic Ci 
layer and artifact Ca layers by our proposed DSLP, respectively. Please see the zoomed-in patches for details. 


Abstract 

The blocking artifact frequently appears in compressed 
real-world images or video sequences, especially coded at 
low bit rates, which is visually annoying and likely hurts the 
performance of many computer vision algorithms. A com¬ 
pressed frame can be viewed as the superimposition of an 
intrinsic layer and an artifact one. Recovering the two lay¬ 
ers from such frames seems to be a severely ill-posed prob¬ 
lem since the number of unknowns to recover is twice as 
many as the given measurements. In this paper, we propose 
a simple and robust method to separate these two layers, 
which exploits structural layer priors including the gradi¬ 
ent sparsity of the intrinsic layer, and the independence of 
the gradient fields of the two layers. A novel Augmented 
Lagrangian Multiplier based algorithm is designed to effi¬ 
ciently and effectively solve the recovery problem. Exten¬ 
sive experimental results demonstrate the superior perfor¬ 
mance of our method over the state of the arts, in terms of 
visual quality and simplicity. 


1. Introduction 

With the emergence of mobile devices, the amount of 
user captured and shared images and videos rapidly in¬ 
creases. A huge space for storing and a wide bandwidth 
for transmitting such data are required if without reducing 
their file sizes properly. Image and video compression tech¬ 
niques have been designed to reduce the file size meanwhile 


preserve the visual quality of the frames. JPEG [1], MPEG 
and H.26x [22, 20] are classic and widely used standards in 
its history, which employ the block Discrete Cosine Trans¬ 
form (DCT), due to its good energy compaction and decor¬ 
relation properties, to achieve the compression. However, 
an inevitable problem of these standards is that as the com¬ 
pression ratio increases, the fidelity of coded images de¬ 
grades, i.e. details are ruined and artificial block boundaries 
appear. The compression artifacts are perceptually annoy¬ 
ing, and more importantly, very likely to degenerate the per¬ 
formance of many computer vision algorithms that are pri¬ 
marily designed for uncompressed images or videos, such 
as image enhancement [27, 19, 12, 25], feature extraction 
[26, 18], over-segmentation [10, 2, 17] and super-resolution 
[29, 7]. Hence, the technique for removing or reducing 
these artifacts is desirable. 

Considering the flexibility to existing codecs makes post¬ 
processing approaches attractive, which handle compressed 
frames at the decoder end, without changing the matur¬ 
ing structure of existing codecs. Mathematically, the com¬ 
pressed image/video sequence C can be modeled as a linear 
combination of two components: C = Cj Ca, where Ci 
and Ca represent the intrinsic layer and the artifact layer, 
respectively {e.g. Fig. 1). In the last decades, signifi¬ 
cant research has been made towards the development of 
post-processing style deblocking techniques, which can be 
broadly categorized into two different groups, namely the 
denoising-style deblocker and the restoration-style one. 

The denoising-style deblockers attempt to suppress the 
effect of Ca by (adaptive) local filters. Very first work 
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proposed by Lim and Reeve [15] employs the low pass fil¬ 
ter on boundaries, which may also blur intrinsic edges of 
the image. To address this problem, techniques that adap¬ 
tively perform filtering on regions obtained by either clas¬ 
sification or detection have been proposed [21, 8]. The re¬ 
cent video coding standard, H.264/AVC [20], analyzes arti¬ 
facts and chooses different filters for different block bound¬ 
aries according to their local properties. WNNM [11] and 
(V)BM3D [6] have the same goal of reducing artifacts, al¬ 
though they are originally designed for denoising by uti¬ 
lizing repetitive patterns in the target images or videos. 
These filtering methods consider the artifacts as noises to 
be smoothed for visual improvement. However, in gen¬ 
eral, this kind of deblockers aims at heuristically smooth¬ 
ing visible artifacts without objective criterion, instead of 
genuinely restoring the original information. 

Alternatively, the restoration-style methods focus on re¬ 
covering Cl under some assumptions. Various priors have 
been exploited [9, 4, 23, 3]. Jung et al attempt to recon¬ 
struct the intrinsic layer via sparse representation, which, 
however, requires the compression ratio is known and the 
dictionary is well-learned [13]. Similarly to [13], Choi etal. 
[5] propose a learning based approach to reduce JPEG arti¬ 
facts for providing more accurate results in image matting. 
More recently. Sun and Liu [24] introduce a non-causal 
temporal prior for video deblocking, which iteratively re¬ 
fines the target frames and the estimation of motion across 
them. Due to the iterative procedure and the optical flow 
estimation, the computational load of this approach is very 
heavy, which limits its applicability. Li et al. [14] develop 
a four-step method including structure-texture decomposi¬ 
tion, scene detail extraction, block artifact reduction and 
layer recomposition. This approach can produce promis¬ 
ing results when the whole or a big part of image with poor 
texture. In other words, the block artifacts in poor texture 
regions are well suppressed. Otherwise, its performance 
sharply degrades. Usually, the recovered results obtained by 
the restoration-style methods are of better quality than those 
by the denoising ones. But they are either time consuming 
and complex (hard to be applied to real world tasks), or 
case dependent (short of generality). 

As can be seen from the aforementioned methods, the 
characteristics of the two layers have been well investigated 
individually, the relationship between the two layers, how¬ 
ever, has been rarely studied. In this paper, we show how to 
decompose the intrinsic and artifact layers for an image or 
a video sequence by exploiting some strong structural layer 
priors in both the two layers. The main contributions of this 
paper can be summarized as follows: 

• We propose an effective one-step visual data deblock¬ 
ing method DSLP that harnesses two structural layer 
priors, i.e. 1) the independence between the gradient 
fields of the two layers, and 2) the sparsity of the gra¬ 


dient field of the intrinsic layer, in a unified fashion. 

• We design a novel Augmented Lagrange Multiplier 
based algorithm to efficiently and effectively seek the 
solution of the associated optimization problem. To 
demonstrate the efficacy and the superior performance 
of the proposed algorithm over the state-of-the-art al¬ 
ternatives, extensive experiments are conducted. 

2. Deblocking using Structural Layer Priors 

2.1. Notations 

We first introduce some notations used in this pa¬ 
per. Lowercase letters (a,6,...) mean scalars, bold low¬ 
ercase letters (a, 6,...) vectors, while bold uppercase let¬ 
ters (A,S,...) matrices. Specifically, I and 1 stand for 
the identity matrix and matrix of all ones with compati¬ 
ble dimensions. The vectorization operation of a matrix 
vec(A) is to convert a matrix into a vector. Bold cal¬ 
ligraphic uppercase letters (A, B,...) represent high or¬ 
der tensors. A. G - ^n n-order 

tensor, whose elements are represented by ad^^d 2 ,...:dn ^ 
K- ad^,...,dk-u-4k+u-,d^ S means the mode- 

k fiber of A at {di,..., ..., which is the 

higher order analogue of matrix rows and columns. The 
Lrobenius and norms of A are respectively defined as 

11*^11^ •“ ll*^lll •“ '^\^di,d2,...,dn\^ 

while the norm ||^||o is the number of non-zero el¬ 
ements in A. The inner product of two tensors with 
identical size is computed as (AB) := ^{cidi,d 2 ,...,dr^ * 
bdi,d 2 ,---,dn)- ^w[A represents the non-uniform shrink¬ 
age operator, the definition of which is that, for each ele- 

mentin .d„ [«di.d2,...,dn] ■= sgn{ad,,d^,...,dj ■ 

max(|adi.d2.....d„| - w'di,d2,...,d„, 0 ). And Ad) B means 
the Hadamard product of two tensors with same size. The 
mode-k unfolding of A is to convert a tensor A into a ma¬ 
trix, i.e. unfold(A, k) := G Moreover, 

we denote := vec(A[/j.]) G RlIILi^iXi xhe mode- 
k folding transforms to A, say fold(A[/j.],/c) := A. 
And the operator reshape(a [/j.] , k) is to reshape back 
to It is clear that, for any k, \\A\\f = ||^[/c]||f = 

ll«[fe]llF, ||.4.||i = ||A[*.]||i = ||a[fe]||i, ||.4||o = ||A[*.]||o = 
||a[fe]||o, and {A,B) = 

2.2. Problem Formulation 

To be general, we employ tensors as the information con¬ 
tainer. Lor instance, a gray image is a 2-order tensor, a color 
image 3-order, while a color video 4-order. Recall that the 
compressed image or video sequence is superimposed by 
the intrinsic and artifact components: C = Cj Ca- Lrom 
this model, however, we can see that the number of un¬ 
knowns to be recovered is twice as many as that of the given 
measurements, which indicates that the problem is highly 



ill-posed. Therefore, without additional knowledge, the de¬ 
composition problem is intractable as it has infinitely many 
solutions and thus, it is impossible to identify which of these 
candidate solutions is indeed the “correct” one. To make the 
problem well-posed, we impose additional structural layer 
priors on the desired solution for Ci and Ca- Before de¬ 
tailing the structural layer priors and the formulation for the 
problem, we first define the tensor modo-k derivative re¬ 
sponse and generalized tensor gradient. 

Definition 1. (Tensor Mode-k Derivative Response.) The 
derivative response of an n-order tensor A. along mode-k 
(k G {1, 2,..., n}) fibers is defined as: 

d\{A,k) 

where fjL is the vertical derivative filter and * is the opera¬ 
tor of convolution. 

Definition 2. (Generalized Tensor Gradient.) The general¬ 
ized gradient of an n-order tensor A is defined as: 

VA := 1), 2),9^(A n)}, 

which is analogue to the definition of matrix gradient. 

Please notice that, for an image G ^ ^ 

are its width, height and color channel, respectively) and a 
video sequence G ]^^x/ixcxt number of frames), 

the derivative response across different color channels typ¬ 
ically does not have statistical meaning, which is therefore 
omitted for the rest of the paper. Furthermore, for clarity, 
we denote Vi and V 2 as the spatial response operators in 
vertical and horizontal directions respectively, while V 3 the 
temporal response operator. As a consequence, the gradient 
of images isV := {Vi,V 2 } and the gradient of videos is 
V := {Vi, V 2 , Vs}- 

Structural layer priors for the problem. It is well 
known that natural images or videos are largely piecewise 
smooth in both spatial and temporal, and the gradient field 
of intrinsic component is typically sparse. We call this the 
gradient sparsity prior. In addition, the gradient fields of 
the two layers should be statistically (approximately) un¬ 
correlated. Thus, we note this as the gradient independence 
prior. Furthermore, we observe that the fraction of artifact 
in pixel values is usually much smaller than that of intrinsic. 

Based on the priors and the observation stated above, the 
desired decomposition (Ci ,Ca) should minimize the fol¬ 
lowing objective: 

J 

argmin \\Ca\\]p + y^{a\\\/jCi\\o + 0 VjCaWo 

CuCa ^ 

j^i ~ Vj/^AHi?) s. t. C = Gi + Ga 

( 1 ) 


where a, (3 and 7 are the weights controlling the impor¬ 
tances of different terms, and Qj := VjC that can be 
computed beforehand. J can be either 2 for images or 
3 for videos. In the objective function (1), the first term 
restricts that the artifact layer should be light, 
which is treated as a Gaussian noise. The second term 
l|VjX/||o essentially enforces the recovered intrin¬ 
sic layer to have sparse gradient field. And the remaining 
two terms constrain the gradient fields of the two layers to 
be independent of each other. More specifically, the third 
term \\^i^i ® penalizes the overlapping 

of the gradient fields of the two layers, while the fourth 
II ^~ VjXaIIf enforces that, gradients do 
not appear in the observation should not be groundlessly 
generated in both the two layers, and existing gradients 
would also not be gratuitously erased. 

The formulation of the problem (1) can be further sim¬ 
plified according to the following theorem. 

Theorem 1. Suppose we are given an n-order tensor A G 
I^DixDsx-xD^^ exists afunctional matrix Fpq G 

]^Yl7=i satisfying vec(unfold(VpA 1 )) = 

Fpqa^q\, for any p G {1,2, ...,n} and q G {l,2,...,n}. 

Proof. It is well known that vec(unfold(V^A, 1)) can 
be alternatively computed by F^a^p^, where Fp G 
]^nr=i ^ixnr=i Di same functional behavior with 

the corresponding derivative filter. Similarly, there is a per¬ 
mutation matrix Ppq that can transform a^pj to a^qj. So we 
have F — FpPpqPpq^[p] — ^p^pq^io] 
property of permutation matrix P^^Ppq = J, which indi¬ 
cates Fpq := FpPpq is the desired matrix. □ 

With the help of Theorem 1, the objective function (1) 
consequently turns out to be: 

argmin WCaWf + oi\\Fli[i\ llo + O FIa[i] ||o 

+7||gf — Flj^i^ — FIa[i\ III^ s. t. C = Ci ^ Ca^ 

( 2 ) 

where F = [Fn; F 21 ;Fji] G ^ixnr=i ^7 

and g = [vec(ei[i]); vec(e? 2 [i]); •••; vec(e?j[i])] G 

nr=i X 1 Pqj. rest of this paper, we will, for brevity, 
substitute //[i] and Ia[i] with Ij and I a, respectively. 

2.3. Optimization 

It can be seen in the objective function (2), all aforemen¬ 
tioned priors and observation have been taken into account 
in a unified optimization framework for recovering the two 
layers. However, the objective is difficult to directly opti¬ 
mize due to the non-convexity of the fy terms. The convex 
relaxation for these terms is an effective manner to make 
the problem tractable. Hence, we replace the fy norm with 


its tightest convex surrogate, namely the norm. The opti¬ 
mization problem can be rewritten as: 

avgmm\\CAfF+a\\Fli\\i+l3\\FliQFlA\\i 

Ci,Ca (3) 

-\-^\\g — Flj — FIaW^? C = Cl Ca- 


The Augmented Lagrange Multiplier (ALM) with Al¬ 
ternating Direction Minimizing (ADM) strategy [16] has 
proven to be an efficient and effective solver of problems 
like (3). To adopt ALM-ADM to our problem, we need 
to make our objective function separable. Thus we intro¬ 
duce two auxiliary variables u and v to replace Fli and 
FI A, respectively in the objective function (3). Accord¬ 
ingly, u = FI I and v = FI a act as the additional con¬ 
straints. Naturally, the formulation (3) can be modified as: 

argmin||£A||F + allwlli + /3||n© vUi +7||g -u- v\\% 

s.t. C = Cl ^ Ca-, u = Fli^ V = FI A- 

(4) 

Converting the constrained minimizing problem (4) to the 
unconstrained gives the augmented Lagrangian function of 
(4) as follows: 


£ = 


'\\CAfF + a\\uh+f3\\uQvh 
< +j\\g-u-v\\% + ^X,e-Ci-CA) (5) 
, +^{yi,u-Fli)+^{y 2 ,v - FIa), 


with the definition B) := ^||B|||p + {A,B), where 
/i is a positive penalty scalar and, AT, and y 2 are the 
Lagrangian multipliers. Besides the Lagrangian multipli¬ 
ers, there are four variables, including Cj, Ca, u and v, 
to solve. The solver iteratively updates one variable at a 
time by fixing the others. Fortunately, each step has a sim¬ 
ple closed-form solution, and hence can be computed effi¬ 
ciently. The solutions of the subproblems are as follows: 
Z^A-subproblem: With other terms fixed, we have: 


enables us to efficiently compute the solution as: 


(^+ 1 ) _ 


= F- 




(t) 






Lit) . 


F{F^F+{^ + l)l) 


( 8 ) 


C^^^^ = fold(reshape(/^^^\ (9) 

where F{’) and stand for the FFT and inverse FFT 

operators, respectively. The division in (8) is element-wise, 
-subproblem: Discarding the unrelated terms provides: 


,(t+i) 


i -FZ/). 


= argmm 


( 10 ) 


Similarly to the Ca subproblem, the updating of can 

be done in the following manner: 


/(^+i) 






it) 


-F^(«W + g)) 


F{F^F + I) 


( 11 ) 


j^it+L) = fol(i(reshape(Zj*~'’^\ 1), 1), (12) 

with := vec(C[i] + 

n-subproblem: Let us now focus on updating 
which corresponds to the following optimization problem: 


u 


«l|w||i + 7 llfif -u- 

/3||wOt>W||i + $(yf\w-F 4 *+'^). 

(13) 


The closed form solution is obtained by: 


u 


(t+i) ^ 5 

cxl 


+ /3|'u(*) I 


2j{g — i;0)) 3 - /iO) jiT'/O+i) _ y(t) 


27 + /iO) 


(14) 


i;-subproblem: The updating of i;0+i) is analogue to that 
of The associated optimization problem is: 


/»(^+ 1 ) 


= argmin 

Ca 


\\CA\\l + ^{X^*\C-Cf -Ca) 
+ -FIa). 


( 6 ) 

For computing C^^^\ we take derivative of (6) with re¬ 
spect to Ca and set it to zero, which gives: 


(F^F + (^ + 1)1)1 A = m(*) + F^(t;(*) + ^), (7) 

where := vec(C[i] + ~ for brevity. Directly 

calculating the inverse of the matrix + 1)^) 

is intuitive for solving I a- But if the matrix size is relatively 
large like in our problem, the inverse operation is very ex¬ 
pensive. Fortunately, by assuming circular boundary condi¬ 
tions, we can apply FFT techniques on this problem, which 


argmm < 


f iWg-u 


— 'iiit+t) 


■ v\\% + ©i^lli 




(15) 


/2 , 12 J. 

Similarly, the closed form solution of (15) looks like: 

' 27(9 - ■ 


-y(*+l) = S 


P\uit + t) I 

it) 


27 - 1 - 


(16) 

Multipliers and g: Besides, there are the multipliers and g 
need to be updated, which can be simply accomplished by: 

X^t+I) ^J^it) _ ^ 71 ) _ £^+ 1 )). 


yf+l) =yW + /,(*)(«(*+!) - FZ7'4; 
=y^^^ + - Fl^r\ 


( 17 ) 















Algorithm 1: Deblocking using Structural Layer Pri¬ 
ors_ 

Input: The observed tensor C G - 


> 0 ; /3 > 0 ; 7 > 0 . 

Initi.: > 0; p> l;t = 0; 

= 0 e 


^( 0 ) _ 

while not converged do 

(‘)=vec(C[i] + ^)- 4 *) 

= fold(reshape(Z^^^\ 1)? 1)^ 


,(o) 


,( 0 ) 


,( 0 ) 


= 0 G 






X • • • X Dri • 

DiXl. 


i 


(^+ 1 ) _ 


= T- 






W = vec(C[i] + 7^) - I 


w 


l(t+l) ^ jr-1 


(^+ 1 ) . 




f(f^f+/) 

^^t+ 1 ) _ 1 ), 1 ); 

^(t+i) = 

2'y-\-fiW 


^ c.l + ^|-u( 0 | 

^(t+ 1 ) _ 


27 +/i,(*) 


(*) 


^ ^{t) ^ ^(t)^(, _ £^t+l) _ £^+l)). 
^(t)(ti(t+i) _ir4*+i)); 

7*+!) =7(*);t = t + l; 


j/r 

= yf 


end 

Output: C]^ = Cj 


(t — 1 ) ^>K ^(t — 1 ) 


For clarity, the procedure of solving the problem (3) 
is summarized in Algorithm 1. The algorithm terminates 
when \\C — < ^||C||f with ^ = 10“’^ 

or the maximal number of iterations is reached. 

3. Experiments 

Parameter Effect. Our model involves three free pa¬ 
rameters including a, f3 and 7. We here test the effect of 
each parameter. Although the quality assessment for the 
task of deblocking is questionable [28], we still employ 
some to reflect the trend of varying parameters. The most 
widely used full reference quality assessment might be the 
peak signal-to-noise ratio (PSNR), which is mathematically 
simple, but does not correlate well with perceived visual 
quality. So we do not employ PSNR to quantitatively mea¬ 
sure the performance in this paper. Alternatively, the struc¬ 
tural similarity (SSIM) metric tries to measure how similar 
a pair of images are (the deblocked result and its original). 



Figure 2: Top: the effect of a with p and 7 flxed. Middle: 
the effect of /3 with a and 7 flxed. Bottom: the effect of 7 
with a and /3 flxed. Left: the case with JPEG quality 10. 
Right: the case with JPEG quality 20. 


which considers three aspects of similarity including lumi¬ 
nance, contrast and structure, and thus is more appropriate 
than PSNR. In addition, we introduce a novel metric called 
gradient consistency (GC) to corporate with SSIM, which 
is deflned as follows: 


GC{A, B) 


||VA-VB||| 

nr=iA 


(18) 


where A is the reference and B the recovered. GC is to 
see the consistency of gradients of two individuals. Please 
notice that the higher SSIM the better, while the lower GC 
the better. Because the dependence of the three parameters 
is complex, we test them separately. For a, we flx /3 and 7 
to 30 and 6, respectively. As can be viewed in Fig. 2, the 
best a values change from 0.6 ^ 0.7 for the case with JPEG 
quality 10 to 0.2 ^ 0.3 for the case with JPEG quality 20 in 
terms of both SSIM and GC. This result is consistent with 
the fact that more artifacts require more powerful smoother 
to eliminate. As for /3, we can observe from the second row 
of Fig. 2 that it performs stably in the range [15,100] for 
JPEG quality 10 and [5,100] for JPEG quality 20, respec¬ 
tively. Similarly, the parameter 7 can achieve high perfor¬ 
mance when it is set to a relatively large value for both the 
two cases shown in Fig. 2. For the rest experiments, we will 
flx (3 and 7 to 30 and 6, respectively. 

Convergence Speed. Figure 3 displays the convergence 
speed of the proposed Algorithm 1 , without loss of general¬ 
ity, on the image shown in Fig. 1, in which the stop criterion 
sharply drops to the level of 10“^ with about 30 iterations 
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Figure 4: An illustrative example to reveal the difference between TV model and our method. 


Convergence Speed 



Iteration 

Figure 3: The convergence speed of Algorithm 1. 

and to 10“^ with 70 iterations. We also show four pairs of 
the separated layers at 3, 5, 30 and 70 iterations. We see 
that the results at 30 iterations is very close to those at 70. 

Relationship to TV model. From the objective func¬ 
tion (3), we can observe that our model can reduce to 
the anisotropic Total Variation (TV) model by disabling 
the third and fourth terms, say the gradient independence 
prior. To demonstrate the benefit of the gradient indepen¬ 
dence prior, we conduct a comparison between TV and our 
method. To better view the difference, we do not introduce 
artifacts into the testing. As shown in Fig. 4, bigger a leads 
to more details smoothed for both TV and DSLP. The dif¬ 
ference is that, in terms of visual quality, TV smooths both 
the high-frequency and low-frequency information, while 
our DSLP eliminates weak textures but keeps dominant 
edges. Quantitatively, when setting a to 1.0, DSLP achieves 
0.6302 SSIM and 80.41 GC, which are much better than 
those of TV, i.e. 0.4467 SSIM and 217.45 GC. The results 
of a = 0.5 are analogue. Please note that even increasing a 
to 1.5, DSLP still can provide very promising result. From 
the viewpoint of artifact, we further give an example shown 
in Fig. 5 to see the power of the independence prior. For 
better view, we amplify the artifact to 10 times of it. As 
can be seen, TV greatly filters textures with very high false 
positive ratio (the details of bird body), while DSLP mainly 
focuses on the block artifacts. The above experimental re¬ 



Recovered Artifact by DSLP Recovered Artifact by TV 


Figure 5: Visual comparison of recovered artifact between 
TV and our proposed method. 



Input BM3D DSLP IDSLP 


Figure 6: Illustration of JPEG compression complication. 

suits reveal the relationship and the difference between TV 
and DSLP, and demonstrate the advance of DSLP. 

IDSLP: Improved DSLP. Let us here revisit the com¬ 
plication of JPEG compression in terms of visual quality. 
As can be viewed in the first image of Fig. 6 (JPEG Quality 
10), there are actually two main issues, say the staircase ef¬ 
fect around block boundaries as well as the serration along 
image edges. The denoising techniques like BM3D [6] can 
reduce the serration in the frame, but hardly deal with the 
staircase effect, as shown in the second picture of Fig. 6 
(setting (7 = 50). As for DSLP, it is good at cleaning the 
staircase around block boundaries but leaves the serration 
(see the third picture in Fig. 6, setting a = 0.6). Intuitively, 
we can further improve the visual quality by making use of 
their respective advantages. The most right result in Fig. 6 
demonstrates the effectiveness of such a strategy, which is 
obtained by firstly executing the denoising technology (in 
this paper we adopt BM3D, a = 25) and then applying 
DSLP on the denoised version (a = 0.3). 

Image Deblocking. In this part, we evaluate the per- 
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Figure 7: Performance comparison among FoE [23], SADACT [8], JAS [14], BM3D [6], TV [4], DSLP and IDSLP on image 
deblocking. Besides the visual results, three quantitative metrics are reported, i.e. SSIM/GC/Time(s). 
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Figure 8: Visual comparison of video deblocking (16 frames). Two rows correspond to two sample frames. 


formance of our method on image deblocking, compared 
with the state-of-the art alternatives including a reconstruc¬ 
tion based method using Field of Experts (FoE) [23], a local 
filtering based method via Shape Adaptive DCT (SADCT) 
[8], a layer decomposition based method for JPEG Ar¬ 
tifact Suppression (JAS) [14], a denoising based method 
BM3D [6], a Total Variation regularized restoration method 
(TV) [4], and our proposed DSLP and IDSLP. The codes 
for the competitors are either downloaded from the au¬ 
thors’ websites or provided by the authors, their parame¬ 
ters are tuned or set as suggested by the authors for obtain¬ 
ing their best possible results. As for DSLP on image de¬ 
blocking, only spatial gradients are taken into account, say 
V:={Vi,V 2 }.In addition, all the codes are implemented 
in Matlab, which assures the fairness of time cost compari¬ 
son. We provide the quantitative (SSIM, GC and Time) and 
qualitative results on several images in Pig. 7, which are 
compressed by JPEG with quality 10. As can be seen from 
Pig. 7, PoE, SADCT, JAS and BM3D can only slightly sup¬ 
press but not thoroughly eliminate the staircase effect un¬ 
der such a compression rate. DSLP is able to eliminate or 
largely reduce the staircase, while IDSLP can further miti¬ 
gate the effect of edge serration. In terms of computational 
cost, DSLP is superior to SADCT and PoE, and competitive 
with JAS and TV, but inferior to BM3D. Moreover, IDSLP 
integrates the denoising and deblocking components, and 
thus its time cost sums up those of BM3D (for this paper) 
and DSLP. Due to the limited space and the nature of the 
deblocking problem, so please see the supplementary ma¬ 
terial for larger and more results, which are best viewed in 
original sizes. 

Video Deblocking. Lor this task, we test both spatial 
only gradients V := {Vi,V 2 } and spatial-temporal gradi¬ 
ents V :={Vi,V 2 ,V 3 }for (I)DSLP, which are denoted as 
(I)DSLP and (I)VDSLP, respectively. This comparison in¬ 


volves VBM3D that is a video extension of BM3D, DSLP, 
IDSLP and IVDSLP. ^ Prom Pig. 8, we can see that the prob¬ 
lem for BM3D on image deblocking still exists for VBM3D 
on video deblocking. In other words, the staircase remains 
(see yellow arrows). DSLP significantly reduces the stair¬ 
case effect, while IDSLP and IVDSLP further take care of 
the serration. We note that, compared with IDSLP, IVD¬ 
SLP slightly excludes some textures (e.g. the leaves on the 
top-right corner, white arrows). This is because the tempo¬ 
ral gradient is enforced to be sparse, which would be more 
helpful for videos with slow motions, but over-smooth the 
content of videos with sudden or fast motions. More video 
results can be found in the supplementary. 

4. Conclusion 

Artifact separation from images or video sequences is an 
important, yet severely ill-posed problem. To overcome its 
difficulty, this paper has shown how to harness two prior 
structures of the intrinsic and artifact layers, including the 
gradient sparsity of the intrinsic layer and the gradient inde¬ 
pendence between the two components, to make the prob¬ 
lem well-defined and feasible to solve. We have formulated 
the problem in a unified optimization framework and pro¬ 
posed an efficient algorithm to find the optimal solution. 
The experimental results, compared to the state of the arts, 
have demonstrated the clear advantages of the proposed 
method in terms of visual quality and simplicity, which can 
be used for many advanced image/video processing tasks. 


^Another related video deblocking method is [24], but its code is not 
available when this paper is prepared. Therefore, we do not compare with 
it. Moreover, with regard to time cost, as the authors of [24] stated, their 
C++ implementation takes about 3 hours to process 32 frame 640 x 480 
sequence, which significantly limits its applicability. 
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